Modeling skewed distributions using multifractals and the W-20 law’
نویسندگان
چکیده
The focus of this paper is on the characterization of the skewness of an attributevalue distribution and on the extrapolations for interesting parameters. More specifically, given a vector with the highest h multiplicities ci = (rnl,rn2, . . . . mh), and some frequency moments Fp = Crnj, (e.g., q = 0,2), we provide effective schemes for obtaining estimates about either its statistics or subsets/supersets of the relation. We assume an 80120 law, and specifically, a p/(1 p) law. This law gives a distribution which is commonly known in the fractals literature as ‘multifractal’. We show how to estimate p from the given information (first few multiplicities, and a few moments), and present the results of our experimentations on real data. Our results demonstrate that schemes based on our multifractal assumption consistently outperform those schemes based This work was partially supported by the National Science Foundation under Grants No. CDR-8803012, EEC-9402384, IRI-8958546 and IRE9205273), with matching funds from Empress Software Inc. and Thinking Machines Inc. Part of the work performed while visiting AT&T BeII Laboratories. Permission to copy without fee all OT part of this material is granted provided that the copies are not made OT distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, OT to republish, requiTes a fee and/or special permission from the Endowment. Proceedings of the 22nd VLDB Conference Mumbal(Bombay), India, 1996 307 on the uniformity assumption, which are commonly used in current DBMSs. Moreover, our schemes can be used to provide estimates for supersets of a relation, which the uniformity assumption based schemes can not not provide at all.
منابع مشابه
Using Weighted Distributions for Modeling Skewed, Multimodal and Truncated Data
When the observations reflect a multimodal, asymmetric or truncated construction or a combination of them, using usual unimodal and symmetric distributions leads to misleading results. Therefore, distributions with ability of modeling skewness, multimodality and truncation have been in the core of interest in statistical literature, always. There are different methods to contract ...
متن کاملModeling skewed distributions using multifractals and the ` 80 - 20 law '
The focus of this paper is on the characterization of the skewness of an attribute-value distribution and on the extrapolations for interesting parameters. More speciically, given a vector with the highest h multiplicities ~ m = (m 1 ; m 2vide eeective schemes for obtaining estimates about either its statistics or subsets/supersets of the relation. We assume an 80/20 law, and speciically, a p=(...
متن کاملModeling Fractal Structure of City-Size Distributions Using Correlation Functions
Zipf's law is one the most conspicuous empirical facts for cities, however, there is no convincing explanation for the scaling relation between rank and size and its scaling exponent. Using the idea from general fractals and scaling, I propose a dual competition hypothesis of city development to explain the value intervals and the special value, 1, of the power exponent. Zipf's law and Pareto's...
متن کاملModeling skewed distributions using multifractals and the law
The focus of this paper is on the charac terization of the skewness of an attribute value distribution and on the extrapolations for interesting parameters More speci cally given a vector with the highest h multiplicities m m m mh and some frequency moments Fq P mqi e g q we pro vide e ective schemes for obtaining estimates about either its statistics or subsets supersets of the relation We ass...
متن کامل